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Abstract 

In this paper I report on an investigation into the 
problem of assigning tones to pitch contours. The 
proposed model is intended to serve as a tool for 
phonologists working on instrumentally obtained 
pitch data from tone languages. Motivation and 
exemplification for the model is provided by data 
taken from my fieldwork on Bamileke Dschang 
(Cameroon). Following recent work by Liberman 
and others, I provide a parametrised Fq predic- 
tion function V which generates Fq values from a 
tone sequence, and I explore the asymptotic be- 
haviour of downstep. Next, I observe that tran- 
scribing a sequence X of pitch (i.e. Fq) values 
amounts to finding a tone sequence T such that 
V(T) w X. This is a combinatorial optimisation 
problem, for which two non-deterministic search 
,tpf?hniqi,ips are pro'^rided' a genetic algorithm and 



a simulated annealing algorithm Finally^ two- 



implementations — one for each technique — are de- 
scribed and then compared using both artificial 
and real data for sequences of up to 20 tones. 
These programs can be adapted to other tone lan- 
guages by adjusting the Fq prediction function. 

INTRODUCTION 

The wealth of literature on tone and intonation 
has amply demonstrated that voice pitch (Fq) in 
speech is under independent linguistic control. In 
English, voice pitch alone can signal the distinc- 
tion between a statement and a question. Sim- 
ilarly, in many tone languages, voice pitch alone 
signals the tense of a verb. Phonologists usually 
describe a pitch contour much as they describe 
speech more generally, namely as a sequence of 
discrete units (i.e. a transcription). This is illus- 
trated in Figure 1, where L indicates a low tone 
and J,H indicates a downstepped high tone. The 
question addressed in this paper concerns how we 



should relate pitch contours to tone sequences. 

This paper is divided into four main sections, 
summarised in turn below. 



Tone Transcription In this section I present 
the problem of relating sequences of Fq val- 
ues to tone transcriptions. I argue that Hidden 
Markov Models are unsuited to the task and I 
demonstrate the importance of having a com- 
putational tool which allows phonologists to ex- 
periment with Fq scaling parameters. 

Fo Scaling This section gives a mathematical 
basis for a general approach to Fq scaling which, 
it is hoped, will be applicable to any tone lan- 
guage. I derive an Fq prediction function from 



first principles and show how the model of Liber- 
man et al. (1993| ) for the Ni gerian language Igbo 



is a special case. 

Tone and Fq in Bamileke Dschang Here I 
present some data from my own fieldwork and 
give a statistical analysis, using the same tech- 
nique used by Liberman et al. I then show how 
the general model of the previous section is in- 
stantiated for this language. This demonstrates 
the versatility of the general model, since it can 
be applied to two very different tone languages. 

Implementations This section provides two 
non-deterministic techniques for transcribing an 
Fo string. The first method uses a genetic algo- 
rithm while the second method uses simulated 
annealing. The performance of both implemen- 
tations is evaluated and compared on a range 
of artificial and real data. Finally, I give some 
examples of multiple, automatically-generated 
transcriptions of the same Fq data. 
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iH L iH L iH L iH L i,H L jH L jH L jH L iH L 
mo mbo ma mbo mo mbo mo mbo mo mbo mo mbo mo mbo mo mbo mo mbo 



Figure 1: Fq Trace for Bamileke Dschang Utterance: 'child and child and ... ' 



TONE TRANSCRIPTION 



A Tool for Phonologists 



Generation and Recognition 

A promising way of generating contours from tone 
sequences is to specify one or more pitch tar- 
gets per tone and then to interpolate between the 
targets; the task then becomes one of providing 
a suitable seque nce of targets ( [Pierrehumbert fc 
Bcckman, 1988 ). It is perhaps less clear how we 



should go about recognising tone sequences from 
pitch contours. Hid den Markov Models (HMMs) 



( [Huang et al., 1990| ) offer a powerful statistical ap- 
proach to this problem, though it is unclear how 
they could be used to recognise the units of in- 
terest to phonologists. HMMs do not encode tim- 
ing information in a way that would allow them 
to output, say, one tone per syllable (or vowel). 
Moreover, the same section of a pitch contour may 
correspond to either H or L tones. For example, 
a H between two Hs looks just like an L between 
two Ls. There is no principled upper bound on the 
amount of context that needs to be inspected in or- 
der to resolve the ambiguity, leading to a multipli- 
cation of state information required by the HMM 
and problems for training it. 

In the present context, the emphasis is not 
on automatic speech recognition but on a tool to 
support phonologists working with tone. As we 
shall see in the next section, once the phonologist 
has identified the salient location to measure the 
'Fq value' of a syllable (or some other phonologi- 
cal unit), the task will be to automatically map a 
string of these values to a string of tones. 



Connell and Ladd have devised a set of heuris- 
tics for identifyin g key points in an Fq c ontour to 
record Fq values ( Ponnell fc Ladd, 1990i 21ff). In 
the absence of a program which enshrines these 
heuristics, it was decided to develop a system for 
producing a tone transcription from a sequence of 
Fq values. Apart from the obvious benefits of au- 
tomating the process, such as speed and accuracy, 
it could show up cases where there is more than 
one possible tone transcription, possibly with dif- 
ferent parameter settings for the Fq scaling func- 
tion. Having the set of tone transcriptions that 
are compatible with an utterance has considerable 
value to an analyst searching for invariances in the 
tonal assignments to individual morphemes. 

To exemplify this point, it is worth considering 
a recent example where an alternative transcrip- 
tion of some data proved valuable in providing a 
fresh analysis of the data. In their analyses of tone 
in Bamileke Dschang, Hyman gives the transcrip- 
tion in (^) while Stewart gives the one in (|l|l 
for the phrase meaning laachete of dogs. 



(1) a. Jljli m5mj,bh4 — ( |Hyman, 1985| , 50) 
b. jijiit ' im5mbh4 — (^tewart, 199^, 200) 



These two possibilities exist because of different Fq 
scaling parameters. These parameters determine 
the way in which the different tones are scaled 
relative to each other and to the speaker's pitch 
range. This is ill ustrated in (^) , adapting Hyman's 
earlier notation ( Hyman, 1979 ). 



(2) a. Hyman: jijii mamjbhtt 
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b. Stewart: jljlit ' |m£)mbh4 
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Example (g) displays a kind of phonetic interpre- 
tation function. Immediately below the two rows 
of tones we see a row of numbers corresponding to 
the tones. For Hyman, L=3 and H=l, while for 
Stewart, L=2 and H=l. Observe in Hyman's ex- 
ample that a rising tone — symbolised by a wedge 
above the i — is modelled as an LH sequence in 
keeping with standard practice in African tone 
analysis. 

The second row of numbers corresponds to 
downstep (J.) and upstep (f). For Hyman's model, 
this row begins at and is increased by 1 for each 
downstep encountered. For Stewart's model, this 
row begins at 1 and is increased by 1 for each 
downstep encountered and decreased by 1 for each 
upstep encountered. The two rows are summed 
vertically to give the last row of numbers. Ob- 
serve that the last rows of Stewart's and Hyman's 
models are identical. 

The parameter which distinguishes the two 
approaches is partial vs. total downstep. Hyman 
treats Dschang as a partial downstep language, 
i.e. where J,H appears as a mid tone (with respect 
to the material to its left). Stewart treats it as a 
total downstep language, i.e. where J,H appears as 
an L tone (with respect to the material to its left). 

While Hyman and Stewart present rather dif- 
ferent analyses of rather different looking tran- 
scriptions, we can see that they are really analyz- 
ing the same data, given the above interpretation 
function. Therefore, phonologists who do not wish 
to limit themselves to the transcriptions which re- 
sult from certain parameter settings in the pho- 
netic interpretation function would be better off 
working directly with number sequences like the 
last row in (||). This paper describes a tool which 
lets them do just that. 



Consider again the Fq contour in Figure 1. In 
particular, note that the Fq decay seems to be to 
a non-zero asymptote, and that H and L appear to 
have different asymptotes which we symbolise as h 
and I respectively. These observations are clearer 
in Figure 2, which (roughly speaking) displays the 
peaks and valleys from Figure 1. 
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Figure 2: Asymptotic Behaviour of Fq 

Although this is admittedly a rather artificial 
example, it remains true that there is no princi- 
pled upper limit on the number of downstcps that 
can occur in an utterance ( Clements, 1979| , 540), 
and so the asymptotic behaviour of Fq scaling still 
needs to be addressed. 

Now suppose that we have a sequence T of 
tones where ti is the ith tone (H or L) and a se- 
quence X of Fq values where Xi is the Fq value 
corresponding to ti . Then we would like a formula 
which predicts Xi given Xi^i, ti and i^-i (i > 1). 
We express this as follows: 



Vt. 



-it 



(Xi-i) 



The question, now, is what should this function 
look like? Suppose for sake of argument that the 
ratio of L to the immediately preceding H in Fig- 
ure 2 is constant, with respect to the baselines for 
H and L, namely h and I. Then we have: 



Xi-i ~ h 

More generally, suppose that we have a sequence 
of two arbitrary tones. Ignoring the possibility of 
downstep for the present, we have a static two- 
tone system where HH and LL sequences are level 



and sequences like HLHLHL are realised as simple 
oscillation between two pitches. We can write the 
following formula, where U = h ii U = H and 
ti — I if ti = L. 



Xi ti 



^ - A. 

The situation becomes more interesting when we 
allow for downdrift and downstep. Downdrift is 
the automatic lowering of the second of two H 
tones when an L intervenes, so HLH is realised 
as [~_-] rather than as [~_~] , while downstep is the 
lowering of the second of two tones when an inter- 
vening L is lost , so HjH is realised as [~-] (Hyman 
fc Schuh, 1974| ). Bamileke Dschang has downstep 



but not downdrift while Igbo has downdrift but 
only very limited downstep. Now we define U = h 
if ti —H, J,H and ti — I if ti —L, ],L. Generalising 
our equation once more, we have the following, 
where i? is a factor called the transition ratio. 



Xi 



Xi = Vti-itiixi-i) 
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Now I shall show how this general equation relates 
to the equations for Igbo ( Liberman ct al., 1993| , 
151), reproduced below: 

(3) HH Xi = x^-i 

HL x, = {Fl/h)x,-i+l{l-F) 

LH Xi = {h/l)xi^i 

LL X, = Fx.^i + l{l - F) 

HjH X, = Dx.^i + h{l - D) 

V can be instantiated to the set of equations in 
by setting R as follows: 

t, ^ 

< F < 1 
< £» < 1 
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It will be helpful to introduce one more level 
of generality. V relates adjacent Fq values, but 
we would also like to relate non-adjacent values, 
given the sequence of intervening tones. Suppose 
that T = ^0 ■ ■ ■ is a tone sequence where the Fq 
value of ^0 is x. Then we shall write the Fq value 
of tn as Vrix). By repeated apphcations of V we 



can write down the following expression for Vt'- 

Vt{x) = ^-^RT.X + tn{l-RT) 

to 

where Rt = Ilfe^i Rtk-itk^ n > 2. Now, suppose 
that S = sq ■ ■ ■ Sm and T — to ■ ■ - tn are tone se- 
quences and that sq = to, s„l — in and Vs = Vt- 
Then it is straightforward to show that Rs = Rt- 
Notice also that if Vt{x) — x for all x and if 
to — in then Rt — 1. These results will be useful 
in the next section. 

Finally, it is worth comparing V with Hyman's 
and Stewart's interpretation functions which were 
illustrated in (|2|). As pointed out already, Hy- 
man's is a partial downstep model while Stewart's 
is a total downstep model. Partial and total down- 
step can be visualised as follows, where the dot- 
ted lines indicate the abstract register inside which 
tones are scaled, and where downstep corresponds 
to lowering of the register. 



Partial downstep 



Total downstep 





Observe that for partial downstep, it is necessary 
to have two downsteps before a high tone is at 
the level of a preceding low, while for total down- 
step, it is only necessary to have a single down- 
step for a high tone to be at the same level as 
the preceding low. We can express these obser- 
vations about partial and total downstep in the 
model as follows. For partial downstep, we have 
^LXHiH(a^) = X while for total downstep we have 
V-Linix) = X. For both of these equations we are 
forced to have h = I which does not seem to be em- 
pirically justifiable in view of the data in Figure 1. 
It might be argued that this indicates a flaw in the 
model being presented here, since partial and total 
downstep are widely attested in the literature on 
tone languages. Unfortunately, it is not possible 
in general to provide a model for partial or total 
downstep which permits distinct asymptotes for H 
and L.Q Therefore, to the extent that Figure 1 is 
typical of tone languages in having different H and 



^ To see why this is so for the case of total down- 
step, suppose that such a model did exist, and so I < h. 
Let X £ [l,fi), & valid Fq value for a low tone. Now, 
whatever interpretation function V' we use, we still 



L asymptotes, one must conclude that total and 
partial downstep are qualitative terms only. How- 
ever, they may yet re-emerge in the model under 
a different guise, as we shall see later. 

The effect of the distinction between partial 
and total downstep is to allow different transcrip- 
tions of the same string, as we saw in (^. In 
general, we have the following mapping between 
transcriptions under the two views of downstep: 
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It is clear that changing from one view of down- 
step to the other amounts to adding and delet- 
ing i and t while leaving the tones themselves 
unchanged. Thus, the model admits both tran- 
scription schemes that result from the two views 
of downstep, and another besides, as shown later 
m ©. 

This concludes the discussion of the Fq pre- 
diction function. In the next section I shall in- 
vestigate the phonetic interpretation of tone in 
Bamileke Dschang, and determine the values of 
R for this language. 

TONE AND Fo IN 
BAMILEKE DSCHANG 

In a recent field trip to Western Cameroon to 
study the Bamileke Dschang noun associative 
construction, I was able to collect a small amount 
of data relating to Fq scaling throughout a par- 
ticular informant's pitch range. Following Liber- 
man et al., voice pitch was varied by getting the 
informant to speak at different volumes and by 
adjusting the recording level appropriately. How- 
ever, rather than asking the informant to imag- 
ine speaking to a subject at different distances, 

require that 'Pl|h(^) = a; by definition of total down- 
step, which means that there is now a high tone with 
a Fo value less than h. But h is the asymptote below 
which no high tones should ever be realised, and so we 
have a contradiction. The case for partial downstep 
follows similarly. 

^ Bamileke Dschang is a grassfields Bantu language 
spoken in the Western Province of Cameroon. The 
name 'Bamileke' (pron: [ba'mileke]) represents both 
an ethnic grouping and a language cluster; Dschang 
(pron: [tjai)]) is an important town around which one 
of the Bamileke languages is spoken. The data here is 
from the Bafou dialect. 



I controlled the volume by having the informant 
wear headphones and played white noise from a 
detuned radio. Thus, I could set the informant's 
voice pitch by using the volume control on my ra- 
dio. My hypothesis is that this technique produces 
more consistent volume (and hence, pitch scaling) 
over long utterances and may make informants less 
self-conscious about speaking loudly than simply 
asking them to imagine speaking to subjects at 
various distances away. Measurements were taken 
from the following data. 

(5) HH d 3«6 sag te ngao tag te ngao tug te 
ngtto kdp te ngao kop 
He saw the bird before he saw the hat before 
he saw the basket before he saw the pipe 
before he saw the cup 

LL dpdk — side, half 
LiH, HL 

ej,s5 mb5 ejsD mb5 ... e|s6 
jealousy and jealousy and ... jealousy 
15ip5 mbo 15ip5 mb5 ... lajpa 
breast and breast and ... breast 
meiv8t mb5 meivet mb5 ... melvet 
oil and oil and ... oil 
|m6 mb5 jmo mbo ... jmo 
child and child and ... child 

Regrettably, the LL data was only available 
from isolated disyllables, and other sequences such 
as LH and HiH were not available at all. How- 
ever, from the Fq data for the above utterances 
we can hypothesise the behaviour of these unseen 
sequences, and this can be tested in subsequent 
empirical work. The results for utterances involv- 
ing HH and LL sequences are displayed in Figure 
3, while results for LiH and HL are displayed in 
Figure 4. 

The regression equations obtained from these 
data are displayed in ^ , where the number of oc- 
currences of each tone sequence is given in paren- 
theses after the sequence. The third column gives 
the standard error for the gradient and intercept. 
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Equation 




Error 


HH (119) 


Xi = 0.99xi_i 


+ 0.91 


0.012, 5.0 


LL (11) 


Xi = 1.02a;i_i 


- 1.39 


0.057, 3.6 


HL (40) 


Xi = 0.65a;i_i 


+ 25.0 


0.015, 3.1 


LiH (38) 


Xi — 1.10xi_i 


+ 0.54 


0.026, 4.3 



From this, we conclude that HL is the only se- 
quence with an intercept significantly different 
from zero, and that Xi = Xi-i for HH and LL 
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Figure 3: Plot of Xi-i vs Xi for HH, LL 
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Figure 4: Plot of vs xi for L|H, HL 



sequences. We also conclude that i?HH = ^ll = 
i?LiH = 1, {l/h = 1.1) and i?HL = 0.72. This 
last value will be referred to as the quantity d. 
We also see that I — 88Hz and h — 96Hz. Fortu- 
nately, these figures are sufficient to determine the 
R values for all other pairs of tones in Bamileke 
Dschang. 

A further observation is that Bamileke 
Dschang does not have downdrift, and so there is 
no Fq difference across HLH and LHL sequences. 
This is evident in Figure 5. Therefore, we can 
write Vnhaix) = x, and by a result we showed 
above, -Rhl-^lh = 1- Given that Rhl — d it fol- 
lows that i?LH = 2- 

Concerning downstep, I shall assume that the 
magnitude of downstep is independent of the tones 



on either side, and so 'pHLiH = "Phxh = T'lxl = 
^LHiL- A separate instrumental study supports 
this hypothesis (Bird & Stegen, 1993). Therefore, 
we have Rst = ^-f^sj,t = dRg^t, where s is any tone 
and t is H or L. 



Finally, it is important to briefly consider up- 
stcp, since it has been used in some analyses of 
Bamileke Dschang (e.g. Stewart's). Given that up- 
step and downstep are intended as inverses of each 
other, we have the identities Vsitu = T^st ~ "PsTtit, 
with s, t as before. We now have a complete table 
for R: 

U 





H L 


iH iL 




H, iH, TH 
L, iL, TL 


1 d 
d-i 1 


d 

1 d 


d-' 1 



Observe the symmetries in this table. The config- 
uration of four R values that we find when U is not 
downstepped or upstepped (the first two columns) 
is reproduced in the columns for downstep (multi- 
plied by d) and in the columns for upstep (divided 
by d). 

Note also that the above table is dependent 
upon how the data in (^ was transcribed. Sup- 
pose that we had not used repetitions of HL|H 
(a transcription scheme based on partial down- 
step) but H],LH (a scheme based on total down- 
step). Then we would have had -Rhxl = d and 
i?LH = 1. Accordingly, the table for R would be 
as follows: 

U 





H L 


iH iL 


TH TL 


H, iH, TH 
L, iL, TL 


1 1 
1 1 


d d 
d d 


d-' d-' 
d-1 d-1 



The fact that we have two possible tables for 
R is no cause for alarm. Recall that the transition 
between two tones ti_i and U also involves the 
factor ti/ii-i. This factor is manifested in tone 
transitions according to the following pattern: 
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H L 


iH iL 


TH TL 


H, iH, TH 
L, iL, TL 


1 l/h 
h/l 1 


1 l/h 
h/l 1 


1 l/h 
h/l 1 



I therefore conclude that the presence of more 
than one table for R indicates an interplay be- 
tween R values and the ratio h/l. This raises 
an interesting question. Suppose we have two 
tone sequences T = tg - ■ - tn and T' = tg • • • t^, 
and two interpretation functions V and V' based 
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Figure 5: Fq Trace for 'bird and bird and 



on R and R' respectively. Then under what cir- 
cumstances is the phonetic interpretation of both 
sequences the same under their respective inter- 
pretation functions? A sufficient condition for 
them to be the same is that ti = t[ and that 
Rti^iti — R[i f> ■ The reader can check that these 

conditions are met by the mapping in and the 
two tables for R given above. Note that this ob- 
servation holds for the model in general, not just 
for the specialised version of the model as applied 
to Bamileke Dschang. 

It can also be shown that R is completely de- 
termined once RuL is specified. A possible charac- 
terisation of total vs. partial downstep now arises: 
if i?HL = 1 then we have total downstep, but if 
-Rhl = d < 1 then we have partial downstep. 
However, the interpretation of these terms must 
necessarily be different from the standard inter- 
pretation, since I have shown that the standard 
interpretation is not compatible with the present 
model. 

This concludes the discussion of Fo scaling in 
Bamileke Dschang. I shall now present the imple- 
mentations. 



IMPLEMENTATIONS 



In this section, I show how it is possible to get 
Lwu piugiams Lu pioducc a feL!L[iience -of- 



' (i.e. A Lone Liansciiptiuu) given a sequence of u 
Fq values X. The programs make crucial use of 
the prediction function V in evaluating candidate 
tone transcriptions. 

Both programs involve search, and in general, 
the aim in searching is to discover the values for 



a;i , . . . , so as to optimise the value of a specified 
evaluation function f{xi,...,x„). When / has 
many local optima, deterministic methods such as 
hill-climbing perform poorly. This is because they 
terminate in a local optimum and the particular 
one found depends heavily on the starting point in 
the search, and there is usually no way of choosing 
a good starting point. 

Exhaustive search for the global optimum 
is not an option when the search space is pro- 
hibitively large. In the present context, say for 
a sequence of 20 tones, the search space contains 
6^" w 10^^ possible tone transcriptions, and for 
each of these there are thousands of possible pa- 
rameter settings, too large a search space for ex- 
haustive search in a reasonable amount of compu- 
tation time. 

Non- deterministic search methods have been 
devised as a way of tackling large-scale combinato- 
rial optimisation problems, problems that involve 
finding optima of functions of discrete variables. 
These methods are only designed to yield an ap- 
proximate solution, but they do so in a reasonable 
amount of computation time. The best known 



such methods are genetic search (Goldberg, 198!;) 
and annealing search {van Laarhoven & Aarts 



1987). Recently, annealing search has been suc- 



cessfully applied to the learning of phonological 
constraints expressed as finite-state automata (El- 
1992| ). In the following sections I describe a 



lison. 



genetic algorithm and an annealing algorithm for 
the tone transcription problem. 



A Genetic Algorithm 

For a cogent introduction to genetic search and an 
explanation of why it works, the reader is referred 



to (South et al., 1993). Before presenting the ver- 
sion of the algorithm used in the implementation, 
I shall informally define the key data types it uses 
along with the standard operations on those types. 

gene A linear encoding of a solution. In the 
present setting, it is an array of n tones, where 
each tone is one of H, |H, fH, L, |L or |L. 
A gene also contains 16 bit encodings of the 
parameters /i, I and d. These encodings were 
scaled to be floating point numbers in the range 
[90, 110] for h, [70, 100] for / and [0.6,0.9] for d. 

gene pool An array of genes, P. One of the 
search parameters is the size of P, known as 
the population. The gene pool is renewed each 
generation, and the number of generations is an- 
other search parameter. 

evaluation A measure of the fitness of a gene as 
a solution to the problem. Suppose that X is 
the sequence of Fq values we wish to transcribe. 
Suppose also that T is a particular gene. The 
the evaluation function is as follows: 

1 " 



crossover This is an operation which takes two 
genes and produces a single gene as the re- 
sult. Suppose that A — ai • • • a„ and B = 
bi ■ ■ - bn- Then the crossover function Cr is de- 
fined as follows, where r is the (randomly se- 
lected) crossover point (0 < r < n). 

Cr{ai ■ ■ ■ flrflr+l • • ■ a„, 61 • ■ • brbr+1 ■ ■ ■ bn) 
= ai ■ ■ ■ Orbr+l ■ ■ - bn 

In other words, the genes A and B are cut at 
a position determined by r and the first part of 
A is spliced with the second part of B to create 
a new gene. Crossover builds in the idea that 
good genes tend to produce good offspring. To 
see why this is so, suppose that the transcrip- 
tion contained in the first part of A is relatively 
good while the rest is poor, while the transcrip- 
tion contained in the first part of B is poor and 
the rest is relatively good. Then the offspring 
containing the first part of A and the second 
part of B will be an improvement on both A 
and B; other possible offspring from A and B 



will be significantly worse and may not survive 
to the next generation. The program performs 
this kind of crossover for the parameters h, I 
and d, employing independent crossover points 
for each, and randomising the argument order 
in C'r so that the high order bits in the offspring 
are equally likely to come from either parent. 

An extension to crossover allows more than one 
crossing point. The current model permits an 
arbitrary number of crossing points for crossover 
on the transcription string. The resulting gene 
is optimal since we choose the crossing points in 
such a way as to minimise {Vti_-itiixi~i) — Xi)'^ 
at each position. In developing the system, ex- 
ploiting the decomposability of the evaluation 
function in this way caused a significant im- 
provement in system performance over the ver- 
sion which used simple crossover. 

breeding For each generation, we create a new 
gene pool from the previous one. Each new gene 
is created by mating the best of three randomly 
chosen genes with the best of three other ran- 
domly chosen genes. 

mutation In order to maintain some genetic di- 
versity and an element of randomness through- 
out the search (rather than just in the initial 
configuration) , a further operation is applied to 
each gene in every generation. With a certain 
probability (known as the mutation probability) , 
for each gene T and each tone in T, the tone 
is randomly set to any of the six possible tones. 
Likewise, the parameter encodings are mutated. 
The mutation rate is set to 0.005 but raised to 
0.5 for a single generation if the evaluation of 
the best gene is no improvement on the evalu- 
ation of the best gene ten generations earlier. 
The best gene is never mutated. 

The building blocks of genetic search dis- 
cussed above are structured into the following al- 
gorithm, expressed in pseudo-Pascal: 

procedure genetic_search 

begin 

initialise Pool, NcwPool; 
for g := 1 to generations do 
begin 

if good_performance(10) then 

mutation_rate := 0.005; 
else 

mutation_rate := 0.5; 
NewPool[l] := find_best_gene(Pool); 
for n := 2 to population do 



begin 

genel :— best_of_three(Pool); 
gene2 := best_of_three(Pool); 
NewPool[n] := crossover (genel, gene2); 
mutate (NewPool[n] , mutation_rate); 
end 

Pool := NewPool; 
evaluate (Pool); 
end 

write find_best_gene(Pool); 
end 

The main loop is executed for each generation. 
Each time through this loop, the program checks 
performance over the last ten generations and if 
performance has been good, the mutation rate 
stays low, otherwise it is changed to high. Then 
it copies the best gene to the new pool. Now we 
reach the inner loop, which selects two genes, per- 
forms crossover, and mutates the result. Next, 
the current pool is updated, an evaluation is per- 
formed, and the program continues with the next 
generation. Once all the generations have been 
completed, the program displays the best gene 
from the final population and terminates. 

An Annealing Algorithm 

As with genetic algorithms, s imulated annealing 
( van Laarhovcn fc Aarts, 1987 ) is a combinatorial 
optimisation technique based on an analogy with 
a natural process. Annealing is the heating and 
slow cooling of a solid which allows the formation 
of regular crystalline structure having a minimum 
of excess energy. In its early stages when the tem- 
perature is high, annealing search resembles ran- 
dom search. There is so much free energy in the 
system that a transition to a higher energy state 
is highly probable. As the temperature decreases 
the search begins to resemble hill-climbing. Now 
there is much less free energy and so transitions 
to higher energy states are less and less likely. In 
what follows, I explain some of the parameters of 
annealing search as used in the current implemen- 
tation. 

temperature At the start of the search the tem- 
perature, t is set to 1. During the search, the 
temperature is reduced at a rate set by the 'cool- 
ing rate' parameter, until it reaches a value less 
than 10-s. 

perturbation At each step of the search, the cur- 
rent state is perturbed by an amount which de- 
pends on the temperature. The temperature de- 
termines the fraction of the search space that 



is covered by a single perturbation step. For 
a tone sequence of length n, we randomly reset 
the worst n.t tones according to {T'ti_iti{xi-i) — 
Xi)"^ . For the parameters we proceed as fol- 
lows, here exemplified for h. First, set p = 
^(^max — ^min)- Now, add to h & random number 
in the range [— p, p] and check that the result is 
stiU in the range [/i,nin, ^max]- 

equilibrium At each temperature, the system is 
required to reach 'thermal equilibrium' before 
the temperature is lowered. In the present con- 
text, equilibrium is reached if no more than one 
of the last eight perturbations yielded a new 
state that was accepted. 

free energy function This is the amount of 
available energy for transitions to higher energy 
states. In the current system, it is the distribu- 
tion — \QQQ.t.log{p), where p is a uniform ran- 
dom variable in the range (0, 1]. If the energy 
difference A between an old and a new state is 
less than the available energy, then the transi- 
tion is accepted. The factor of 1000 is intended 
to scale the energy distribution to typical values 
of the evaluation function. 

Now the algorithm itself is presented: 

procedure annealing_search 

begin 

initialise Trans, NewTrans, Best Trans; 
randomise Trans; 
t := 1; 

while t > 0.000001 do 
begin 
repeat 

NewTrans := perturb(Trans, t); 
A := evaluate (NewTrans) 

— evaluate(Trans); 
if A < or 

exp(-A/1000.t) > random(0,l) then 
Trans := NewTrans; 
if evaluate (Trans) < evaluate(Best Trans) 
BcstTrans :— Trans; 
until equilibrium_reached; 
Trans :— BestTrans; 
temperature := temperature / 1.2; 
end 

write Trans; 
end 

The program is made up of two loops. The 
outer loop simply iterates through the tempera- 
ture range, beginning with a temperature of 1 and 
steadily decreasing it until it gets very close to 



zero. The nested loop performs the task of reach- 
ing thermal equilibrium at each temperature. The 
first step is to perturb the previous transcription 
to make a new one. Notice that the temperature t 
is a parameter of the perturb function. Next, the 
difference A between the old and new evaluations 
is calculated. If the new transcription has a bet- 
ter evaluation than the old one, then A is negative. 
Next, the program accepts the new transcription 
if (i) A is negative or (ii) A is positive and there 
is sufficient free energy in the system to allow the 
worse transcription to be accepted. Finally, we 
check if the new transcription is better than the 
best transcription found so far (Best Trans) and if 
so, we set BestTrans to be the new transcription. 
Once equilibrium is reached, the current transcrip- 
tion is set to be the best transcription found so far, 
and the search continues. 
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Figure 6: Performance results (no upstep) 



Performance Results 

Both the genetic and annealing search algorithms 
have been implemented in C++. In this section, the 
performance of the two implementations is com- 
pared. Performance statistics are based on 1,200 
executions of each program. Search parameters 
were set so that each execution took around 5 sec- 
onds on a Sun Sparc 10. Three performance trials 
were undertaken. 

Trial 1: Artificial Data. In the first trial, both 

programs generated random sequences of tones, 
then computed the corresponding Fq sequence us- 
ing V, then set about transcribing the Fq se- 
quence. Since these sequences were ideal, the best 
possible evaluation for a transcription was zero. 
The performance of the programs could then be 
measured to sec how close they came to finding 
the optimal solution. Each program was tested on 
Fo sequences of length 5, 10, 15 and 20. For each 
length, each program transcribed 100 randomly- 
generated sequences. The results are displayed in 
Figure 6. Each pair of bars corresponds to a given 
transcription length. The left member of each pair 
is for the genetic search program, while the right 
member is for the annealing search program. 

The heavily shaded bars corresponding to 
evaluations less than 1 are the most important. 
These indicate the rmmbcr of times out of 100 that 
the programs found a transcription with an eval- 
uation less than 1. This evaluation means that 
the average of the squared difference between the 
predicted Fq values and the actual Fq values was 
less than IHz. Observe that the annealing search 
program performs significantly better in all cases. 



Note that the mutation operation in the genetic 
search program treats each bit in the parameter 
encodings equally, while the perturbation opera- 
tion in the annealing search program is sensitive 
to the distinction between more significant vs. less 
significant bits. This may explain the better con- 
vergence behaviour of the annealing search. 

Notice also in Figure 6 that performance 
does not degrade with transcription length as the 
length doubles from 10 to 20. This is probably 
because a randomly generated sequence will con- 
tain downsteps on every second tone (on average) 
causing a general downtrend in the Fq values and 
severely limiting the combinatorial explosion of 
possible transcriptions. 

Trial 2: Artificial Data with Upstep. Trial 

2 was the same as trial 1 except that this time 
upstep was permitted as well. The results are dis- 
played in Figure 7. Again the annealing program 
fares better than the genetic program. Consider 
again the bars corresponding to evaluations less 
than 1. For both programs, however, observe that 
the performance degrades more uniformly than in 
trial 1, probably because the inclusion of upstep 
greatly increases the number of possible transcrip- 
tions (and hence, the number of local optima). 

Trial 3: Actual Data. The final trial involved 
real data, including data from the utterance given 
in Figure 1 . This trial involved four subtrials. The 
first and second had Fq sequences of length 10, 
while the third and fourth had length 18 and 19. 
The first and second sequences were taken by ex- 
tracting the initial 10 Fo values from the third and 
fourth sequences, thereby avoiding the asymptotic 
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Figure 7: Performance results (upstep) 
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Figure 8: Performance results for actual data 



behaviour of the longer sequences. The data is 
tabulated below, and it comes from the sentences 
in (|5|). 



Trial 



Fo sequence 



1 219,168,183,150,160,136,144,123,131,115 

2 205,224,167,200,156,175,136,156,127,140 

3 219,168,183,150,160,136,144,123,131,115, 
122,107,113,105,118,100,113,95 
205,224,167,200,156,175,136,156,127,140, 
118,129,109,119,103,120,102,111,95 

Performance results are given in Figure 8. Notice 
that the interpretation of the shading in this figure 
is different from that in previous figures. This is 
because evaluations near zero were less likely with 
real data. In fact, the annealing program never 
found an evaluation less than 3 while the genetic 
program never found an evaluation less than 4. 



Since the programs performed about equally 
on finding transcriptions with an evaluation less 
than 7, I shall display these transcriptions along 
with an indication of how many times each 
program found the transcription (G = genetic, 
A = annealing). I give transcriptions which oc- 
curred at least twice in one of the programs, dur- 
ing 100 executions of each. 

Trial 1: Transcriptions G A 
HiLiHiLiHiLiHiLiHiL 27 37 
HiLiHiLiHLiHiLjHL 7 
HiLiHLiHLjHLiHL 3 
HiLHiLHiLHiLHiL 20 2 
HLiHLiHLiHLiHL 24 39 

Trial 2: Transcriptions G A 
LiHiLHiLiHiLiHiLiH 5 
LiHiLHiLiHiLHiLiH 66 54 

Trial 3: Transcriptions G A 

HjLiHiLiHLiHiLiHLjHiLiHLHiLHiL 11 
HiLiHLiHLiHLiHLiHiLiHLHiLHiL 1 2 
HiLiHLiHLiHLiHLiHLiHLHiLHiL 10 14 
HLiHLiHLiHLiHLiHLiHLHiLHjL 30 56 

Trial 4: Transcriptions G A 
LiHiLHiLiHiLHiLiHiLiHiLiHiLHiLiHiL 60 29 

LiHiLHiLiHiLHjLiHLiHiLiHiLHiLiHiL 5 19 

LiHiLHiLiHiLHiLjHLiHiLiHLHiLiHiL 7 7 

LiHiLHiLiHiLHiLiHiLiHiLiHLHiLiHiL 4 

LjHiLHiLiHiLHiLiHLiHLiHiLHiLiHiL 3 

LiHiLHiLiHiLHjLiHLiHLiHLHiLiHiL 6 

The results from trial 1 deserve special attention. 
In trial 1, three transcriptions were found by both 
programs. The best evaluations found are given 
below: 



H LiH LjH LjH L^H L 


£ 


B 


h 


107 / 


100 


d 


0.68 


HjL HjL HjL HjL HjL 




A 


h 


90 / 


93 


d 


0.76 


HiLiHiLiHiLiHiLiHiL 


£ 


9 


h 


107 I 


100 


d 


0.82 



It is striking to note that the first two transcrip- 
tions above are what Hyman and Stewart (respec- 
tively) would have given as transcriptions for the 
abstract Fq sequence 132435465 7. This is 
demonstrated in (|^a,b). The third transcription 
points to another possibility, given in (0c). 



(7) a. Hyman's transcription scheme 

H L iH L iH L iH L iH L 



1324354657 

Stewart's transcription scheme 

H jL H jL H jL H jL H jL 
12 12 12 12 12 
0112233445 
1324354657 



c. Novel transcription scheme 

H jL jH jL jH jL jH jL jH jL 



1 ^ 

2 



2 

i 1 ^ 



1 1 1" 

2- ^2 
Z 4 9 
2^2 



1324354657 



Therefore, there are encouraging signs that 
the program is hving up to its promise of produc- 
ing alternative, equally acceptable transcriptions, 
as desired from an analytical standpoint. 



Multiple Solutions 

Although wc have seen more than one transcrip- 
tion for a given Fq sequence, it is inconvenient to 
be required to run the programs several times in 
order to sec if more than one solution can be found. 
Furthermore, the programs are designed not to get 
caught in local optima, which is a problem since 
interesting alternative transcriptions may actually 
be local optima. Therefore, both programs are 
set up to report the k best solutions, where the 
user specifics the number of solutions desired. The 
program ensures that the same area of the search 
space is not re-explored by subsequent searches. 
This is done by defining a distance metric on tran- 
scriptions which counts the number of tones in one 
transcription that have to be changed in order to 
make it identical to the other transcription. That 
part of the search space within a distance of n/3 
from any previously found solution is not explored 
again. The programs give up before finding k so- 
lutions if 5 randomly generated transcriptions all 
fall within distance n/3 of previous solutions. 

Now, consider the following randomly gener- 
ated sequence of tones: 



TH 


TH iH 


L 


iL TH 


L 


h 


107 1 : 98 


201 


215 201 


173 


163 201 


173 


d: 


0.87 £: 1 



The annealing program was set the task of find- 
ing ten transcriptions of this tone sequence. The 
program was run only twice, and it reported the 
following solutions with evaluations less than or 
equal to 1. Both runnings of the program found 
the same solutions, and in the same order. (Note 
that two transcriptions arc taken to be the same if 
one or both begin with an initial upstep or down- 
step; this has no effect on the phonetic interpreta- 
tion). In the following displays, the predicted Fq 
values are given below each solution to facilitate 
comparison with the input sequence. 



TH TH TH L TL TH L 
201 215 201 172 163 201 172 


h: 101 1:92 
d: 0.88 £: /.£/ 


TH TH TH TL TL H TL 

201 215 201 174 163 201 174 


h: 109 I: 94 
d: 0.87 £: /.e3 


L TH TH L TL TH L 
201 217 201 174 163 201 174 


h: 105 97 
d: 0.86 £: ooM 




H TH TH L TL TH L 
201 214 201 173 164 201 173 


h: 110 100 
d: 0.88 £: iM / 


TH TH TH TL TL H TL 
201 215 201 174 164 201 174 


h: 102 1:88 
d: 0.88 £: /./ 


jL jH jH L TL TH L 
201 217 201 174 163 201 174 


h: 104 I: 96 
d: 0.86 £: ocM 



Since all executions to this point have been 
based on the first table of R values, it was decided 
to try a test with the second table of R values 
to sec if the performance was different. Interest- 
ingly, the third solution in both of the above exe- 
cutions was not found, though two new solutions 
were found. 



TH TH TH L TL TH L 
201 216 201 173 162 201 173 


h:94 1:80 
d: 0.88 £: r.A3 


L TH TH L TL TH L 
201 215 201 174 163 201 174 


h:97 1:84 
d: 0.88 £: i-i] 


TH TH TH TL TL H TL 

201 215 201 174 163 201 174 


h: 100 I: 81 
d: 0.88 £: /.3e 


TL H L TH L TL TH 
201 214 201 173 163 201 173 


ft: 92 /:86 
d: 0.67 £: /.AV 




TH TH TH L TL TH L 
201 216 201 173 162 201 173 


h: 107 1:92 
d: 0.87 £: /.A/ 


L H L TH L TL TH 
201 214 201 173 163 201 173 


h:99 1:93 
d: 0.65 £: /.Ve 


TL TH TH L TL TH L 
201 217 202 174 163 202 174 


h:90 1:78 
d: 0.88 £:f.y/ 



Observe that the value of d in the above solu- 
tions clusters around 0.66 and 0.87. Similar clus- 
tering may be occurring with the ratio h/l. How- 
ever, an analysis of the relationship between the 
kinds of solutions found, the two R tables and 
the parameter values h, I and d has not been at- 
tempted. 

Areas for Further Improvement 

It is rather unsatisfying that the performance of 
the two programs is heavily dependent on the set- 
ting of several search parameters, and it seems to 
be a combinatorial optimisation problem in itself 
to find good parameter settings. My trial- and- 
error approach will not necessarily have found op- 
timal parameter values, and so it would be pre- 
mature to conclude from the performance compar- 
ison that annealing search is better than genetic 



search for the problem of tone traiisc;ription. A 
more thoroughgoing comparison of these two ap- 
proaches to the problem needs to be undertaken. 

Since the parameters are continuous variables, 

and since the evaluation function — which we could 
write as £T,x{h,l,d) — is a smoothly continuous 
function in h, I, d, it would be worthwhile to try 
other (deterministic) search methods for optimis- 
ing h, I and d, once a candidate tone transcription 
T has been found. 

Finally, it would be interesting to integrate a 
system like either of the ones presented here into a 
speech workstation. As the phonologist identifies 
salient points with a cursor the system would do 
the transcription, incrementally and interactively. 

CONCLUSION 

This paper began with a discussion of the prob- 
lem of relating tone transcriptions to their physi- 
cal counterparts, namely Fq traces. I showed that 
it is desirable for phonologists working on tone 
to use sequences of Fq values as their primary 
data, rather than impressionistic transcriptions 
which make (usually implicit) assumptions about 
Fq scaling. I provided an Fq prediction hmction V 
which estimated the Fg value of a tone, given the 
Fq value of the previous tone and the identities 
of the two tones. I presented instrumental data 
from Bamileke Dschang and showed how the func- 
tion could be specialised for this language. The 
function was then incorporated into the evaluation 
functions of two implemented non-deterministic 
search algorithms. The performance results were 
encouraging and demonstrate the promise of au- 
tomated tone transcription. 

ACKNOWLEDGEMENTS 

This research is funded by the UK Economic 
and Social Research Council, under grant R00023 
4439 A Computational Model for the Phonology- 
Phonetics Interface in Tone Languages. I am in- 
debted to SIL Cameroon for their logistical sup- 
port on my field trip in September and October 
of 1993, during which the data presented in the 
paper (and much other data besides) was gath- 
ered, and especially to Nancy Haynes, Gretchen 
Harro for helping me collect the data and Jean- 
Claude Gnintedem who endured many recording 
sessions. I am grateful to John Coleman, Michael 
Gasser and Marie South for helpful comments on 
an earlier version of this paper. The Fq data was 
extracted using the ESPS Waves-|- package in the 



Edinburgh University Phonetics Laboratory. 

References 

Bird, S. & Stegen, O. (1993). Tone in the 
Bamileke Dschang Associative Construction: 
An Electrolaryngographic Study and Com- 
parison with Hyman (1985). RP 57, Univer- 
sity of Edinburgh, Centre for Cognitive Sci- 
ence. 

Clements, G. N. (1979). The description of 
terraced-level tone languages. Language, 55, 
536-558. 

Connell, B. & Ladd, D. R. (1990). Aspects of pitch 
realisation in Yoruba. Phonology, 7, 1-29. 

Ellison, T. M. (1992). Machine Learning of Phono- 
logical Structure. PhD thesis, University of 
Western Australia. 

Goldberg, D. E. (1989). Genetic Algorithms in 
Search, Optimization, and Machine Learning. 
Addison- Wesley. 

Huang, X. D., Ariki, Y., & Jack, M. (1990). 
Hidden Markov Models for Speech Recogni- 
tion. Edinburgh Information Technology Se- 
ries. Edinburgh University Press. 

Hyman, L. M. (1979). A reanalysis of tonal down- 
step. Journal of African Languages and Lin- 
guistics, 1, 9-29. 

Hyman, L. M. (1985). Word domains and down- 
step in Bamileke-Dschang. Phonology Year- 
hook, 2, 45-83. 

Hyman, L. M. & Schuh, R. G. (1974). Univer- 
sal of tone rules: evidence from West Africa. 

Linguistic Inquiry, 5, 81-115. 

Liberman, M., Schultz, J. M., Hong, S., & Okeke, 
V. (1993). The Phonetic Interpretation of 
Tone in Igbo. Phonetica, 50, 147-160. 

Pierrehumbert, J. & Beckman, M. (1988). 
Japanese Tone Structure. Cambridge Mass.: 
MIT Press. 

South, M. C, Wetherill, G. B., & Tham, M. T. 

(1993). Hitch-hiker's guide to genetic algo- 
rithms. Journal of Applied Statistics, 20, 153- 
175. 

Stewart, J. M. (1993). Dschang and Ebrie as Akan- 
type total downstep languages. In H. van der 
Hulst & K. Snider (Eds.), The Phonology of 
Tone - The Representation of Tonal Register 
(pp. 185-244). Berlin; New York: Mouton de 
Gruyter. Linguistic models. Volume 17. 



van Laarhovcn, P. J. M. & Aarts, E. H. L. (1987). 
Simulated Annealing. Dordrecht:Reidel. 



